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1 Introduction 

With a view to basic epidemiological parameters such as incidence, prevalence and mor- 
tality of a disease, it has proven useful to consider so called state m odels or compartmen- 



tal m odels. The model used here is also termed illness-death model (jKalbneisch and Prenticd . 



20021 . Fig. 8.4). It consists of the three states Normal, Disease, Death and the transi- 
tions between the states. Normal means non-diseased with respect to the disease under 
consideration. The numbers of persons in the Normal and Disease state are denoted as 
S (susceptibles) and C (cases), respectively. The transition intensities (synonymously: 
rates) are called as shown in Figure [TJ i is the incidence rate, mo and mi are the 
mortality rates of the non-diseased and diseased persons, respectively. In general, the 
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intensities depend on calendar time t, age a and sometimes also on the duration d of the 
disease. 




Figure 1: Illness-death model of a chronic disease with three states. Persons in the 
state Normal are healthy with respect to the considered disease. In the state 
Disease they suffer from the disease. In the most general case, the transition 
rates depend on the calendar time t, age a, and in case of the disease-specific 
mortality mi also on the disease's duration d. 



When the rates do not depend on calendar time t, the model is called time-homogeneous. 
Then, with the a dditional condition that there is no dependency on the duration, 
Murray and Lopea have considered a two-dimensional system of ordinary differential 
equations (ODEs) to relate the changes of the numbers of healthy and d iseased persons 
with the rates of the in- and outflows of the corresponding states (1994)0: 



dS 

da 
dC 

da 



-(i(a) + m (a)) ■ S 
i(a) ■ S — mi(a) ■ C. 



(1) 



1 Murray and Loped did not report the exact equations, but from a publication two years later it may 
be deduced that they use an approach similar to Eq. (TTJ. 
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Age a plays the role of temporal progression. The linear system (pQ) looks relatively 
harmless, but the impression is misleading. Mostly only the age-specific mortality of the 
general population is well known, and rate mi is epidemiologically accessible as relative 
risk. Then, the system becomes nonlinear. 

Furthermore, the inclusion of the hypothetical values S and C is disturbing. It would 
be better if we had t he ag e-specific prevalence p(a) := s{a)+C(a) nere > what indeed can 



be achieved ([Brinks! . l201ll ) . 



What are the benefits of such ODEs? For smooth incidence- and mortality rates 
plus an initial condition, the age profile of the numbers of patients or the prevalence is 
uniquely determined. To state it clearly, the "forces" incidence and mortality uniquely 
prescribe the prevalence - not only qualitatively but in these quantitative terms. In this, 
we speak of the forward problem: we close from the causes - the forces - to the effect, 
namely the number of diseased persons. The reverse way, closing from the numbers of 
diseased persons to the incidence, is the inverse problem - we infer from the effect to 
the cause. 



This paper is structured as follows: in the next section we describe the illness-death 
model of Figure [T] in terms a new system of two stochastic differential equations (SDEs). 
As an application, in Section 3 we solve a forward problem to estimate the age-specific 
prevalence of systemic lupus erythematosus (SLE) in England and Wales from published 
data. This allows calculation of the mean age at onset of SLE, the mean duration and 
the burden of SLE in terms of diseased persons. 



2 Stochastic description of the illness-death model 

What can be achieved in the domain of ODEs, dividing the number C{a) of the diseased 
by the number S(a) + C(a) of the living for deriving the prevalence at age a, is not that 
easy in random variables. Distributions of quotients of stochastically dependent random 
variables are problematic, so we have to model S and C bivariately. 

Let X(a) := (S(a),C(a)) t be the composite vector (the superscript denotes transpo- 
sition). For Aa > define the vector AX of increments: 



AX (a) := (S(a + Aa) - S(a),C(a + Aa) - C(a))* 

Now we follow the reasoning of ( Allen . 1999 ) and ( Allen . 20081 ) . who have applied the 
theory presented here in the field of infectious diseases modeling. 

Choose Aa > small that at most one person can change the state. In accordance with 
the definition of the rates i, uiq and mi the following assumptions about the probability 
distribution P(AX(a)) are made: 
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P (^AX(a) 



mo (a) • S(a) ■ Aa + o(Ao) if (n, u) = (-1, 0) 

mi (a) • C(a) • Aa + o(Aa) if (u, u) = (0, -1) 

i(a) ■ S(a) ■ Aa + o(Aa) if (u, v) = (-1, 1) (2) 
1 — [mo (a) • <S(a) + mi (a) • C(a)+ 

»(a) ■ 5(a)] • Aa + o(Ao) if («, u) = (0, 0) 

If we further assume that the increments are normally distributed, we get the expected 
value 

+ mi (a) ■ C{a) ■ + i(a) • 5(a) ■ 

and covariance matrix 

7(AX(o)) = £ (AI(a) • AI*(o)) - E (AX(a)) ■ E [AX\a)) 
&E[AX{a) ■ AX\a)) . 

The matrix V is symmetric and positively definite. Hence, there is uniquely determined 
matrix square root V 1 ^ 2 . Due the normal distribution assumption the vector X fulfills 



E(AX(a)) 



mo(a) • S(a) 



Aa+o(Aa) 



X(a + Aa) = X(a) + AX (a) = X(a) + E(AX{a)) + V(AX(a)) 1/2 £, 
where £ = (£1,^2)* has normally distributed components £j ~ -^(0, 1), i = 1, 2. 



(3) 



Under certain smoothness conditions about the coefficient functions i, mi and mo the 
difference equation ([3]) is an Euler approximation to the Ito SDE system 



dS_ 

da 
dC 



(z(a) + m (a)J-»5 + &11 — — h 012 



da 
dWi 



— — = i( a ) • 5 - mi (a) • C + 621— 7— 
da da 



+ 6: 



22- 



da 
dW 2 
da 



(4) 



In th is expression, W\ and W2 are independent Wiener processes (jKloeden and Platen 



1999) and the 2x2- matrix B = (by) is the uniquely determined square root of the 
covariance matrix divided by Aa: 

B = (V(AX)/Aa) 1/2 . 



Which advantages has the SDE formulation compared to the ODE? In rare diseases 
as in the next section, the inclusion of uncertainty is sometimes more appropriate than 
calculating deterministically. In addition, SDEs sometimes have properties that can- 
not be derived from t he th eory of ODEs, as for example the quasistationary solutions 
(jDarroch and Senetal . 1 1 96 71 ) . 
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3 Application to Systemic Lupus Erythematosus 



In this section the SDE is applied to epidemiological data of systemic lupus erythemato- 
sus in England and Wales. Systemic lupus erythematosus (SLE) is a severe rheumatic 
disease with a variety of clinical manifestations. Despite several therapy options, pa- 
tients often are restricted heavily in quality of life and ability to work. Epidemiological 
data are rare. Here, the incidence data for males and females is taken from the UK 
General Practice R esearch Database (GPRD) in the years 1990-1999 as reported in 
(jSomers et all 1200711. Mortality mi of SLE patients is modeled by the relative mortality 



as reported in (jBernatskv et all 120061 ). Duration of SLE was not taken into account. 



Regarding the mortality of the non-diseased, we take the mortality in the general pop- 
ulation. Due to the low prevalence of SLE this is legitimate. Then, 5000 solution paths of 
the S DE system are simulated by the Euler-Maruyama method (jKloeden and Platenl . 



19991 ) and the corresponding age-specific prevalences have been calculated. This is done 



for males and females separately. 

As an example, Figure shows the prevalence resulting from two pairs of solution 
paths. 




i 1 1 1 r 

20 40 60 80 



Age (years) 



Figure 2: Example paths of the age-specific prevalence of SLE resulting from two pairs 
of solution paths of system ([J]). 
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Of course, single paths for the prevalence are not that important. It is more interesting, 
to analyze where the paths of the prevalence lie and what the charcteristics are. As an 
example, Figures and [J] show the regions where 95% of the 5000 solution paths lie. 
The upper and the lower dotted curve indicate the 97.5% and 2.5% quantile of the 5000 
prevalence paths, respectively. This means, for each age a the corresponding quantiles 
from the empirical distribition of the 5000 values at age a are calculated. Additionally, 
Figures [3] and S] show the curve of the median (solid line). 




Figure 3: Age-specific prevalence of SLE Figure 4: Age-specific prevalence of SLE 
in males. in females. 

The median curves of males and females indicate the big difference of prevalent SLE 
between males and females. This is due to the fact, that gender is a risk factor for SLE 
and incidence between males and females differ strongly. The hazard ratio (females vs. 
males) is about 10 in the age-group of 25-35 years. The hazard ratio decreases to about 
5 in the following age-classes until 65 years and after that lowers to about 2. 

For an estimate of the burden of SLE in England and Wales one may estimate the 
total number £ of persons with SLE: 

90 

£ = J>(a)-iV(a), (5) 

a=0 

where iV(a) denotes the number of persons in England and Wales age d a = 0, . . . , 90. The 



numb er N(a) is obtained from official vital statistics in the year 1995, ([Office for National Statistics! . 



20111 ). The age-specific prevalence p(a) is taken from the 5000 paths. 

Figures [5] and [6] show the distributions of £ obtained from the 5000 paths. Again, the 
enormous difference between males and females becomes obvious. While 50.1 thousand 
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Figure 5: Histogram of the number of 
males with SLE. 



Figure 6: Histogram of the number of fe- 
males with SLE. 



females are affected in England and Wales (interquartile range (IQR): 40.3-60.9), for 
males the corresponding number is 9.2 (IQR: 6.6-12.5) thousand. 

In situation described here, it is possible to derive the mean duration D of SLE in 
males and females. The mean duration D is the number of person- years of all SLE 
patients divided by the total number of persons who ever got it: 

90 

£ P{o) ■ N(a) 

*> = ^r^ 1 • ( 6 ) 

J2i(a)-{l-p(a))-N(a) 

a=0 

If we calculate this value for all paths p, we find that in males and females the mean 
duration is 23.2 (IQR: 16.7-31.5) and 23.9 (IQR: 16.2-29.1) years, respectively. Thus, 
genders do not differ much in that respect. Similarly, the mean age at onset Wl may be 
computed: 

90 

Za-i(a)-(l-p(a))-N(a) 

™ = ^To • (7) 

Zi(a)-{l-p(a))-N(a) 

a=0 

The empirical distribution of Mm the 5000 paths yields 51.750 (IQR: 51.748-51.752) 
and 46.108 (46.102-46.113) years in males and females, respectively. It is striking that 
Wt has a relativly low variability in both genders. This is due to the factor 1 — p(a), 
which is close to unity. 
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4 Conclusion 



In the domain of infectious diseases, the theory of deterministic differential eq uations 



was g eneralized towards stochastic differential equations more than a decade ago, ([Allen 



19991 ) . In this article this transformation has been accomplished in the field of chronic 
diseases. The numbers of healthy and diseased persons have been modeled by a new sys- 
tem of two Ito stochastic differential equations. In rare chronic diseases such as systemic 
lupus erythematosus, a stochastic formulation might be preferable over a deterministic. 
Even if the incidence and mortality rates are well-known, statistical fluctuations in the 
number of diseased have a strong impact in the age course of the prevalence. This be- 
comes obvious in Figures [5] and [6] where the distribution of the total number of males 
and females with SLE in England and Wales in 1995 have been estimated. The middle 
fifty spans about 6 and 20 thousand males and females respectively. Additionally, other 
disease characteristics have been calculated. The mean age at onset as derived in our 
theoretical model is 51.8 and 46.1 for males and females, respecti yely. The corr spond- 



ing empirical values 52.2 and 46.3 observed in the register data bv lSomers et al.l are m 
good agreement. Another hint for the appropriateness of the methods described here 
comes from the basic epidemiological equation, tha t overall prevalence equ als the prod- 
uct of overall incidence and duration of the disease (jSzklo and Nietd . |2007) . The overall 
prevalence in males and females can easily be obtained by Eq. ([5]) and the age pyramid 
N(a). In our model the overall prevalence divided by the mean duration (Eq. ([6])) 
yields the overall incidence 1.58 and 7.95 per 100000 person-years for males and females, 
respectively. Again, this is close to the e mpirically observed values 1.60 and 8.01 per 
100000 person- years (jSomers et al.l . 120071 . Tab. 1). When relating the results of this 
study to other epidem iological data fro m the UK, especially the overall incidence for 
females ap pears high. iHopkinson et al.l find a value of about 6.5 per 100000 person - 
years only ( 19931 ). However, it has to be noted that the data of ( Hopkinson et al.l . ll993l ) 
are in a way inconsistent: If we calculate th e mean duration of SLE in females by the 
basic epidemiological equation for the data of lHopkinson et all we find a mean duration 
of about 7 years onl y, which contradicts common survival times of persons with SLE 



(jCervera et all 120031 ). 



Although there is an ongoing debate about the differences between childhood-onset 
SLE and adult-onset as well as duration of SLE as a risk factor for comorbidities and 
mortality, it has to be noticed that reliable epidemiologic data about age at onset, du- 
ration of the disea se, mean age of diseased etc are sparse or lacking. As an example, the 
sytematic review (jDanchenko et all 120061 ) about the global burden and epidemiology of 
SLE just found one study from G ermany, the Euro pean country with the most inhabi- 
tants. The associated publication (jZink et all l200ll ) was about prevalent cases, but did 



not mention a prevalence estimate. Incidence had not been adressed. 
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Theoretical models such as the one presented here may help to at least roughly es- 
timate the burden and characteristics of rare chronic diseases. This is especially true 
in countries with few epidemiological or administrative data. However, the approach 
described here has several limitations. First, the stochastic differential equation does 
not take into account calendar time trends. In the application to SLE, it has been shown 
that relative mortality of SLE patients undergoes a secular trend, (jBernatskv et alJ . l200a . 
Tab. 6). In the same publication we find, that relative mortality in persons with SLE 
depends on the duration of the disease. The longer a person is diseased, the more the 
relative risk decreases. Duration dependence is not modeled in Eq. ([3]). Hence, the new 
approach may be used as an approximation only, and more evaluations of the method 
are necessary to examine validity and applicability of the model. However, the disease 
characteristics derived by the new methods in this work are consistent and indicate an 
interesting and maybe fruitful way to go. 
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